NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Obtaining personalized predictions from a randomized controlled trial on Alzheimer’s disease

https://doi.org/10.1038/s41598-024-84687-4

Shen, Dennis; Agarwal, Anish; Misra, Vishal; Schelter, Bjoern; Shah, Devavrat; Shiells, Helen; Wischik, Claude (December 2025, Scientific Reports)

Free, publicly-accessible full text available December 1, 2026
Network Synthetic Interventions: A Causal Framework for Panel Data Under Network Interference

Agarwal, Anish; Cen, Sarah; Shah, Devavrat; Lee, Christina (October 2023, https://arxiv.org/pdf/2210.11355)

We propose a generalization of the synthetic controls and synthetic interventions methodology to incorporate network interference. We consider the estimation of unit-specific potential outcomes from panel data in the presence of spillover across units and unobserved confounding. Key to our approach is a novel latent factor model that takes into account network interference and generalizes the factor models typically used in panel data settings. We propose an estimator, Network Synthetic Interventions (NSI), and show that it consistently estimates the mean outcomes for a unit under an arbitrary set of counterfactual treatments for the network. We further establish that the estimator is asymptotically normal. We furnish two validity tests for whether the NSI estimator reliably generalizes to produce accurate counterfactual estimates. We provide a novel graph-based experiment design that guarantees the NSI estimator produces accurate counterfactual estimates, and also analyze the sample complexity of the proposed design. We conclude with simulations that corroborate our theoretical findings.
more » « less
Full Text Available
CausalSim: A Causal Framework for Unbiased Trace-Driven Simulation

Alomar, Abdullah; Hamadanian, Pouya; Nasr-Esfahany, Arash; Agarwal, Anish; Alizadeh, Mohammad; Shah, Devavrat (April 2023, 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23))

Full Text Available
CausalSim: A Causal Framework for Unbiased Trace-Driven Simulation

Alomar, Abdullah; Hamadanian, Pouya; Nasr-Esfahany, Arash; Agarwal, Anish; Alizadeh, Mohammad; Shah, Devavrat (April 2023, 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23))
tspDB: Time Series Predict DB

Agarwal, Anish and (January 2021, Proceedings of Machine Learning Research)

A major bottleneck of the current Machine Learning (ML) workflow is the time consuming, error prone engineering required to get data from a datastore or a database (DB) to the point an ML algorithm can be applied to it. This is further exacerbated since ML algorithms are now trained on large volumes of data, yet we need predictions in real-time, especially in a variety of time-series applications such as finance and real-time control systems. Hence, we explore the feasibility of directly integrating prediction functionality on top of a data store or DB. Such a system ideally: (i) provides an intuitive prediction query interface which alleviates the unwieldy data engineering; (ii) provides state-of-the-art statistical accuracy while ensuring incremental model update, low model training time and low latency for making predictions. As the main contribution we explicitly instantiate a proof-of-concept, tspDB which directly integrates with PostgreSQL. We rigorously test tspDB’s statistical and computational performance against the state-of-the-art time series algorithms, including a Long-Short-Term-Memory (LSTM) neural network and DeepAR (industry standard deep learning library by Amazon). Statistically, on standard time series benchmarks, tspDB outperforms LSTM and DeepAR with 1.1-1.3x higher relative accuracy. Computationally, tspDB is 59-62x and 94-95x faster compared to LSTM and DeepAR in terms of median ML model training time and prediction query latency, respectively. Further, compared to PostgreSQL’s bulk insert time and its SELECT query latency, tspDB is slower only by 1.3x and 2.6x respectively. That is, tspDB is a real-time prediction system in that its model training / prediction query time is similar to just inserting, reading data from a DB. As an algorithmic contribution, we introduce an incremental multivariate matrix factorization based time series method, which tspDB is built off. We show this method also allows one to produce reliable prediction intervals by accurately estimating the time-varying variance of a time series, thereby addressing an important problem in time series analysis.
more » « less
Full Text Available
Model Agnostic Time Series Analysis via Matrix Estimation

https://doi.org/10.1145/3287319

Agarwal, Anish; Amjad, Muhammad Jehangir; Shah, Devavrat; Shen, Dennis (December 2018, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

Full Text Available
Model Agnostic Time Series Analysis via Matrix Estimation

Agarwal, Anish; Amjad, Muhammad Jehangir; Shah, Devavrat; Shen, Dennis (December 2018, ACM SIGMETRICS performance evaluation review)

We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) low-rank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with non-overlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la a Hidden Markov Model; e.g., recovering the time-varying parameter of a Poisson process without knowing that the underlying process is Poisson. Furthermore, since our forecasting algorithm requires regression with noisy features, our approach suggests a matrix estimation based method—coupled with a novel, non-standard matrix estimation error metric—to solve the error-in-variable regression problem, which could be of interest in its own right. Through synthetic and real-world datasets, we demonstrate that our algorithm outperforms standard software packages (including R libraries) in the presence of missing data as well as high levels of noise.
more » « less
Full Text Available

Search for: All records